28 research outputs found
Deep Task-specific Bottom Representation Network for Multi-Task Recommendation
Neural-based multi-task learning (MTL) has gained significant improvement,
and it has been successfully applied to recommendation system (RS). Recent deep
MTL methods for RS (e.g. MMoE, PLE) focus on designing soft gating-based
parameter-sharing networks that implicitly learn a generalized representation
for each task. However, MTL methods may suffer from performance degeneration
when dealing with conflicting tasks, as negative transfer effects can occur on
the task-shared bottom representation. This can result in a reduced capacity
for MTL methods to capture task-specific characteristics, ultimately impeding
their effectiveness and hindering the ability to generalize well on all tasks.
In this paper, we focus on the bottom representation learning of MTL in RS and
propose the Deep Task-specific Bottom Representation Network (DTRN) to
alleviate the negative transfer problem. DTRN obtains task-specific bottom
representation explicitly by making each task have its own representation
learning network in the bottom representation modeling stage. Specifically, it
extracts the user's interests from multiple types of behavior sequences for
each task through the parameter-efficient hypernetwork. To further obtain the
dedicated representation for each task, DTRN refines the representation of each
feature by employing a SENet-like network for each task. The two proposed
modules can achieve the purpose of getting task-specific bottom representation
to relieve tasks' mutual interference. Moreover, the proposed DTRN is flexible
to combine with existing MTL methods. Experiments on one public dataset and one
industrial dataset demonstrate the effectiveness of the proposed DTRN.Comment: CIKM'2
Efficient Optimal Selection for Composited Advertising Creatives with Tree Structure
Ad creatives are one of the prominent mediums for online e-commerce
advertisements. Ad creatives with enjoyable visual appearance may increase the
click-through rate (CTR) of products. Ad creatives are typically handcrafted by
advertisers and then delivered to the advertising platforms for advertisement.
In recent years, advertising platforms are capable of instantly compositing ad
creatives with arbitrarily designated elements of each ingredient, so
advertisers are only required to provide basic materials. While facilitating
the advertisers, a great number of potential ad creatives can be composited,
making it difficult to accurately estimate CTR for them given limited real-time
feedback. To this end, we propose an Adaptive and Efficient ad creative
Selection (AES) framework based on a tree structure. The tree structure on
compositing ingredients enables dynamic programming for efficient ad creative
selection on the basis of CTR. Due to limited feedback, the CTR estimator is
usually of high variance. Exploration techniques based on Thompson sampling are
widely used for reducing variances of the CTR estimator, alleviating feedback
sparsity. Based on the tree structure, Thompson sampling is adapted with
dynamic programming, leading to efficient exploration for potential ad
creatives with the largest CTR. We finally evaluate the proposed algorithm on
the synthetic dataset and the real-world dataset. The results show that our
approach can outperform competing baselines in terms of convergence rate and
overall CTR
TextPainter: Multimodal Text Image Generation with Visual-harmony and Text-comprehension for Poster Design
Text design is one of the most critical procedures in poster design, as it
relies heavily on the creativity and expertise of humans to design text images
considering the visual harmony and text-semantic. This study introduces
TextPainter, a novel multimodal approach that leverages contextual visual
information and corresponding text semantics to generate text images.
Specifically, TextPainter takes the global-local background image as a hint of
style and guides the text image generation with visual harmony. Furthermore, we
leverage the language model and introduce a text comprehension module to
achieve both sentence-level and word-level style variations. Besides, we
construct the PosterT80K dataset, consisting of about 80K posters annotated
with sentence-level bounding boxes and text contents. We hope this dataset will
pave the way for further research on multimodal text image generation.
Extensive quantitative and qualitative experiments demonstrate that TextPainter
can generate visually-and-semantically-harmonious text images for posters.Comment: Accepted to ACM MM 2023. Dataset Link:
https://tianchi.aliyun.com/dataset/16003
Geometry Aligned Variational Transformer for Image-conditioned Layout Generation
Layout generation is a novel task in computer vision, which combines the
challenges in both object localization and aesthetic appraisal, widely used in
advertisements, posters, and slides design. An accurate and pleasant layout
should consider both the intra-domain relationship within layout elements and
the inter-domain relationship between layout elements and the image. However,
most previous methods simply focus on image-content-agnostic layout generation,
without leveraging the complex visual information from the image. To this end,
we explore a novel paradigm entitled image-conditioned layout generation, which
aims to add text overlays to an image in a semantically coherent manner.
Specifically, we propose an Image-Conditioned Variational Transformer (ICVT)
that autoregressively generates various layouts in an image. First,
self-attention mechanism is adopted to model the contextual relationship within
layout elements, while cross-attention mechanism is used to fuse the visual
information of conditional images. Subsequently, we take them as building
blocks of conditional variational autoencoder (CVAE), which demonstrates
appealing diversity. Second, in order to alleviate the gap between layout
elements domain and visual domain, we design a Geometry Alignment module, in
which the geometric information of the image is aligned with the layout
representation. In addition, we construct a large-scale advertisement poster
layout designing dataset with delicate layout and saliency map annotations.
Experimental results show that our model can adaptively generate layouts in the
non-intrusive area of the image, resulting in a harmonious layout design.Comment: To be published in ACM MM 202
Hierarchical Masked 3D Diffusion Model for Video Outpainting
Video outpainting aims to adequately complete missing areas at the edges of
video frames. Compared to image outpainting, it presents an additional
challenge as the model should maintain the temporal consistency of the filled
area. In this paper, we introduce a masked 3D diffusion model for video
outpainting. We use the technique of mask modeling to train the 3D diffusion
model. This allows us to use multiple guide frames to connect the results of
multiple video clip inferences, thus ensuring temporal consistency and reducing
jitter between adjacent frames. Meanwhile, we extract the global frames of the
video as prompts and guide the model to obtain information other than the
current video clip using cross-attention. We also introduce a hybrid
coarse-to-fine inference pipeline to alleviate the artifact accumulation
problem. The existing coarse-to-fine pipeline only uses the infilling strategy,
which brings degradation because the time interval of the sparse frames is too
large. Our pipeline benefits from bidirectional learning of the mask modeling
and thus can employ a hybrid strategy of infilling and interpolation when
generating sparse frames. Experiments show that our method achieves
state-of-the-art results in video outpainting tasks. More results are provided
at our https://fanfanda.github.io/M3DDM/.Comment: ACM MM 2023 accepte
AutoPoster: A Highly Automatic and Content-aware Design System for Advertising Poster Generation
Advertising posters, a form of information presentation, combine visual and
linguistic modalities. Creating a poster involves multiple steps and
necessitates design experience and creativity. This paper introduces
AutoPoster, a highly automatic and content-aware system for generating
advertising posters. With only product images and titles as inputs, AutoPoster
can automatically produce posters of varying sizes through four key stages:
image cleaning and retargeting, layout generation, tagline generation, and
style attribute prediction. To ensure visual harmony of posters, two
content-aware models are incorporated for layout and tagline generation.
Moreover, we propose a novel multi-task Style Attribute Predictor (SAP) to
jointly predict visual style attributes. Meanwhile, to our knowledge, we
propose the first poster generation dataset that includes visual attribute
annotations for over 76k posters. Qualitative and quantitative outcomes from
user studies and experiments substantiate the efficacy of our system and the
aesthetic superiority of the generated posters compared to other poster
generation methods.Comment: Accepted for ACM MM 202
Graph cuts for supervised binary coding
Abstract. Learning short binary codes is challenged by the inherent discrete nature of the problem. The graph cuts algorithm is a wellstudied discrete label assignment solution in computer vision, but has not yet been applied to solve the binary coding problems. This is partially because it was unclear how to use it to learn the encoding (hashing) functions for out-of-sample generalization. In this paper, we formulate supervised binary coding as a single optimization problem that involves both the encoding functions and the binary label assignment. Then we apply the graph cuts algorithm to address the discrete optimization problem involved, with no continuous relaxation. This method, named as Graph Cuts Coding (GCC), shows competitive results in various datasets